[FLINK-39094][table-planner] Avoid creating duplicate function instances in code generation#27613
[FLINK-39094][table-planner] Avoid creating duplicate function instances in code generation#27613dylanhz wants to merge 2 commits intoapache:masterfrom
Conversation
…ces in code generation
| contextArgs: Seq[String] = null): String = { | ||
| val classQualifier = function.getClass.getName | ||
| val fieldTerm = CodeGenUtils.udfFieldName(function) | ||
| // check if function has been added before to avoid duplicate function instances |
There was a problem hiding this comment.
I know the text of the PR says that existing tests are sufficient. It would be great to add a method level unit test to test out the logic around this specific change.
There was a problem hiding this comment.
I added unit tests in CodeGeneratorContextTest that directly asserts references.size after calling addReusableFunction with the same/different functions, covering stateless dedup, stateful dedup, and stateful non-dedup cases.
|
|
||
| // set of function instance term that will be added only once | ||
| private val reusableFunctionTerms: mutable.HashSet[String] = mutable.HashSet[String]() | ||
|
|
There was a problem hiding this comment.
how can we be sure in this?
Can we add a test with custom udf and counter inside checking that it was invoked once?
There was a problem hiding this comment.
how can we be sure in this?
The dedup key (fieldTerm) is derived from functionIdentifier(), which is the same key used by addReusableObjectInternal to generate member/init statements — so the dedup is consistent with existing behavior.
Can we add a test with custom udf and counter inside checking that it was invoked once?
I added unit tests in CodeGeneratorContextTest that directly asserts references.size after calling addReusableFunction with the same/different functions, covering stateless dedup, stateful dedup, and stateful non-dedup cases.
As for a counter-based test, addReusableObjectInternal creates instances via InstantiationUtil.deserializeObject that bypasses constructors, and open() was already deduplicated by LinkedHashSet before this fix, so neither counter can distinguish the before/after behavior. Let me know if you have better idea.
|
What about different arguments? |
|
Good Job on function performance |
|
@flinkbot run azure |
This fix targets redundant deep copies in |
What is the purpose of the change
Remove duplicate function instances in code generation.
Brief change log
Add a
reusableFunctionTermsset inCodeGeneratorContextto cache function field terms, preventing redundant deep copies when the same function is added multiple times, similar to reusable converters and type serializers.Verifying this change
Existing tests are enough.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation